In statistics, a design matrix is a matrix of explanatory variables, often denoted by X, that is used in certain statistical models, e.g., the general linear model.[1][2] It can contain indicator variables (ones and zeros) that indicate group membership in an ANOVA.
The design matrix represents the independent variables in statistical models which describe observed data (often called dependent variables) in terms of other known variables (explanatory variables). The theory relating to such models makes substantial use of matrix manipulations involving the design matrix: see for example linear regression. A notable feature of the concept of a design matrix is that it is able to represent a number of different experimental designs and statistical models, e.g., ANOVA, ANCOVA, and linear regression.
Contents |
Example with a one-way analysis of variance (ANOVA) with 3 groups and 7 observations. The first column in the design matrix models the grand [global] mean of the ys, while the 3 remaining columns indicate the group membership of each observation. Here the first group consists of the 3 first observations and the next two groups each consist of two observations.